[Chapter Ten][Previous] [Next]
[Art of Assembly][Randall
Hyde]
Art of Assembly: Chapter Ten
- 10.3 - CASE Statements
- 10.4 - State Machines and Indirect Jumps
- 10.5 - Spaghetti Code
10.3 CASE Statements
The Pascal case
statement takes the following form :
CASE variable OF
const1:stmt1;
const2:stmt2;
.
.
.
constn:stmtn
END;
When this statement executes, it checks the value of variable against the
constants const1 ... constn
. If a match is found then the corresponding
statement executes. Standard Pascal places a few restrictions on the case
statement. First, if the value of variable isn't in the list of constants,
the result of the case
statement is undefined. Second, all
the constants appearing as case
labels must be unique. The
reason for these restrictions will become clear in a moment.
Most introductory programming texts introduce the case
statement
by explaining it as a sequence of if..then..else
statements.
They might claim that the following two pieces of Pascal code are equivalent:
CASE I OF
0: WriteLn('I=0');
1: WriteLn('I=1');
2: WriteLn('I=2');
END;
IF I = 0 THEN WriteLn('I=0')
ELSE IF I = 1 THEN WriteLn('I=1')
ELSE IF I = 2 THEN WriteLn('I=2');
While semantically these two code segments may be the same, their implementation
is usually different[1]. Whereas the if..then..else
if
chain does a comparison for each conditional statement in the
sequence, the case
statement normally uses an indirect jump
to transfer control to any one of several statements with a single computation.
Consider the two examples presented above, they could be written in assembly
language with the following code:
mov bx, I
shl bx, 1 ;Multiply BX by two
jmp cs:JmpTbl[bx]
JmpTbl word stmt0, stmt1, stmt2
Stmt0: print
byte "I=0",cr,lf,0
jmp EndCase
Stmt1: print
byte "I=1",cr,lf,0
jmp EndCase
Stmt2: print
byte "I=2",cr,lf,0
EndCase:
; IF..THEN..ELSE form:
mov ax, I
cmp ax, 0
jne Not0
print
byte "I=0",cr,lf,0
jmp EndOfIF
Not0: cmp ax, 1
jne Not1
print
byte "I=1",cr,lf,0
jmp EndOfIF
Not1: cmp ax, 2
jne EndOfIF
Print
byte "I=2",cr,lf,0
EndOfIF:
Two things should become readily apparent: the more (consecutive) cases
you have, the more efficient the jump table implementation becomes (both
in terms of space and speed). Except for trivial cases, the case
statement is almost always faster and usually by a large margin. As long
as the case
labels are consecutive values, the case
statement version is usually smaller as well.
What happens if you need to include non-consecutive case
labels
or you cannot be sure that the case
variable doesn't go out
of range? Many Pascals have extended the definition of the case
statement to include an otherwise
clause. Such a case
statement takes the following form:
CASE variable OF
const:stmt;
const:stmt;
. .
. .
. .
const:stmt;
OTHERWISE stmt
END;
If the value of variable matches one of the constants making up the case
labels, then the associated statement executes. If the variable's value
doesn't match any of the case
labels, then the statement following
the otherwise
clause executes. The otherwise
clause
is implemented in two phases. First, you must choose the minimum and maximum
values that appear in a case
statement. In the following case
statement, the smallest case
label is five, the largest is
15:
CASE I OF
5:stmt1;
8:stmt2;
10:stmt3;
12:stmt4;
15:stmt5;
OTHERWISE stmt6
END;
Before executing the jump through the jump table, the 80x86 implementation
of this case
statement should check the case
variable
to make sure it's in the range 5..15. If not, control should be immediately
transferred to stmt6:
mov bx, I
cmp bx, 5
jl Otherwise
cmp bx, 15
jg Otherwise
shl bx, 1
jmp cs:JmpTbl-10[bx]
The only problem with this form of the case
statement as it
now stands is that it doesn't properly handle the situation where I is equal
to 6, 7, 9, 11, 13, or 14. Rather than sticking extra code in front of the
conditional jump, you can stick extra entries in the jump table as follows:
mov bx, I
cmp bx, 5
jl Otherwise
cmp bx, 15
jg Otherwise
shl bx, 1
jmp cs:JmpTbl-10[bx]
Otherwise: {put stmt6 here}
jmp CaseDone
JmpTbl word stmt1, Otherwise, Otherwise, stmt2, Otherwise
word stmt3, Otherwise, stmt4, Otherwise, Otherwise
word stmt5
etc.
Note that the value 10 is subtracted from the address of the jump table.
The first entry in the table is always at offset zero while the smallest
value used to index into the table is five (which is multiplied by two to
produce 10). The entries for 6, 7, 9, 11, 13, and 14 all point at the code
for the Otherwise clause, so if I contains one of these values, the Otherwise
clause will be executed.
There is a problem with this implementation of the case
statement.
If the case
labels contain non-consecutive entries that are
widely spaced, the following case
statement would generate
an extremely large code file:
CASE I OF
0: stmt1;
100: stmt2;
1000: stmt3;
10000: stmt4;
Otherwise stmt5
END;
In this situation, your program will be much smaller if you implement the
case
statement with a sequence of if
statements
rather than using a jump statement. However, keep one thing in mind- the
size of the jump table does not normally affect the execution speed of the
program. If the jump table contains two entries or two thousand, the case
statement will execute the multi-way branch in a constant amount of time.
The if
statement implementation requires a linearly increasing
amount of time for each case
label appearing in the case
statement.
Probably the biggest advantage to using assembly language over a HLL like
Pascal is that you get to choose the actual implementation. In some instances
you can implement a case
statement as a sequence ofif..then..else
statements, or you can implement it as a jump table, or you can use
a hybrid of the two:
CASE I OF
0:stmt1;
1:stmt2;
2:stmt3;
100:stmt4;
Otherwise stmt5
END;
could become:
mov bx, I
cmp bx, 100
je Is100
cmp bx, 2
ja Otherwise
shl bx, 1
jmp cs:JmpTbl[bx]
etc.
Of course, you could do this in Pascal with the following code:
IF I = 100 then stmt4
ELSE CASE I OF
0:stmt1;
1:stmt2;
2:stmt3;
Otherwise stmt5
END;
But this tends to destroy the readability of the Pascal program. On the
other hand, the extra code to test for 100 in the assembly language code
doesn't adversely affect the readability of the program (perhaps because
it's so hard to read already). Therefore, most people will add the extra
code to make their program more efficient.
The C/C++ switch
statement is very similar to the Pascal case
statement. There is only one major semantic difference: the programmer must
explicitly place a break
statement in each case
clause to transfer control to the first statement beyond the switch
.
This break
corresponds to the jmp
instruction
at the end of each case
sequence in the assembly code above.
If the corresponding break
is not present, C/C++ transfers
control into the code of the following case
. This is equivalent
to leaving off the jmp
at the end of the case
's
sequence:
switch (i)
{
case 0: stmt1;
case 1: stmt2;
case 2: stmt3;
break;
case 3: stmt4;
break;
default:stmt5;
}
This translates into the following 80x86 code:
mov bx, i
cmp bx, 3
ja DefaultCase
shl bx, 1
jmp cs:JmpTbl[bx]
JmpTbl word case0, case1, case2, case3
case0: <stmt1's code>
case1: <stmt2's code>
case2: <stmt3's code>
jmp EndCase ;Emitted for the break stmt.
case3: <stmt4's code>
jmp EndCase ;Emitted for the break stmt.
DefaultCase: <stmt5's code>
EndCase:
10.4 State Machines and Indirect Jumps
Another control structure commonly found in assembly language programs
is the state machine. A state machine uses a state variable to control program
flow. The FORTRAN programming language provides this capability with the
assigned goto statement. Certain variants of C (e.g., GNU's GCC from the
Free Software Foundation) provide similar features. In assembly language,
the indirect jump provides a mechanism to easily implement state machines.
So what is a state machine? In very basic terms, it is a piece of code[2]
which keeps track of its execution history by entering and leaving certain
"states". For the purposes of this chapter, we'll not use a very
formal definition of a state machine. We'll just assume that a state machine
is a piece of code which (somehow) remembers the history of its execution
(its state) and executes sections of code based upon that history.
In a very real sense, all programs are state machines. The CPU registers
and values in memory constitute the "state" of that machine. However,
we'll use a much more constrained view. Indeed, for most purposes only a
single variable (or the value in the IP register) will denote the current
state.
Now let's consider a concrete example. Suppose you have a procedure which
you want to perform one operation the first time you call it, a different
operation the second time you call it, yet something else the third time
you call it, and then something new again on the fourth call. After the
fourth call it repeats these four different operations in order. For example,
suppose you want the procedure to add ax
and bx
the
first time, subtract them on the second call, multiply them on the third,
and divide them on the fourth. You could implement this procedure as follows:
State byte 0
StateMach proc
cmp state,0
jne TryState1
; If this is state 0, add BX to AX and switch to state 1:
add ax, bx
inc State ;Set it to state 1
ret
; If this is state 1, subtract BX from AX and switch to state 2
TryState1: cmp State, 1
jne TryState2
sub ax, bx
inc State
ret
; If this is state 2, multiply AX and BX and switch to state 3:
TryState2: cmp State, 2
jne MustBeState3
push dx
mul bx
pop dx
inc State
ret
; If none of the above, assume we're in State 4. So divide
; AX by BX.
MustBeState3:
push dx
xor dx, dx ;Zero extend AX into DX.
div bx
pop dx
mov State, 0 ;Switch back to State 0
ret
StateMach endp
Technically, this procedure is not the state machine. Instead, it is the
variable State
and the cmp/jne
instructions which
constitute the state machine.
There is nothing particularly special about this code. It's little more
than a case
statement implemented via theif..then..else
construct. The only thing special about this procedure is that it
remembers how many times it has been called[3]
and behaves differently depending upon the number of calls. While this is
a correct implementation of the desired state machine, it is not particularly
efficient. The more common implementation of a state machine in assembly
language is to use an indirect jump. Rather than having a state variable
which contains a value like zero, one, two, or three, we could load the
state variable with the address of the code to execute upon entry into the
procedure. By simply jumping to that address, the state machine could save
the tests above needed to execute the proper code fragment. Consider the
following implementation using the indirect jump:
State word State0
StateMach proc
jmp State
; If this is state 0, add BX to AX and switch to state 1:
State0: add ax, bx
mov State, offset State1 ;Set it to state 1
ret
; If this is state 1, subtract BX from AX and switch to state 2
State1: sub ax, bx
mov State, offset State2 ;Switch to State 2
ret
; If this is state 2, multiply AX and BX and switch to state 3:
State2: push dx
mul bx
pop dx
mov State, offset State3 ;Switch to State 3
ret
; If in State 3, do the division and switch back to State 0:
State3: push dx
xor dx, dx ;Zero extend AX into DX.
div bx
pop dx
mov State, offset State0 ;Switch to State 0
ret
StateMach endp
The jmp
instruction at the beginning of the StateMach
procedure transfers control to the location pointed at by the State
variable. The first time you call StateMach
it points at the
State0
label. Thereafter, each subsection of code sets the
State
variable to point at the appropriate successor code.
10.5 Spaghetti Code
One major problem with assembly language is that it takes several statements
to realize a simple idea encapsulated by a single HLL statement. All too
often an assembly language programmer will notice that s/he can save a few
bytes or cycles by jumping into the middle of some programming structure.
After a few such observations (and corresponding modifications) the code
contains a whole sequence of jumps in and out of portions of the code. If
you were to draw a line from each jump to its destination, the resulting
listing would end up looking like someone dumped a bowl of spaghetti on
your code, hence the term "spaghetti code".
Spaghetti code suffers from one major drawback- it's difficult (at best)
to read such a program and figure out what it does. Most programs start
out in a "structured" form only to become spaghetti code at the
altar of efficiency. Alas, spaghetti code is rarely efficient. Since it's
difficult to figure out exactly what's going on, it's very difficult to
determine if you can use a better algorithm to improve the system. Hence,
spaghetti code may wind up less efficient.
While it's true that producing some spaghetti code in your programs may
improve its efficiency, doing so should always be a last resort (when you've
tried everything else and you still haven't achieved what you need), never
a matter of course. Always start out writing your programs with straight-forward
if
s and case
statements. Start combining sections
of code (via jmp
instructions) once everything is working and
well understood. Of course, you should never obliterate the structure of
your code unless the gains are worth it.
A famous saying in structured programming circles is "After goto
s,
pointers are the next most dangerous element in a programming language."
A similar saying is "Pointers are to data structures what goto
s
are to control structures." In other words, avoid excessive use of
pointers. If pointers and goto
s are bad, then the indirect
jump must be the worst construct of all since it involves both goto
s
and pointers! Seriously though, the indirect jump instructions should be
avoided for casual use. They tend to make a program harder to read. After
all, an indirect jump can (theoretically) transfer control to any label
within a program. Imagine how hard it would be to follow the flow through
a program if you have no idea what a pointer contains and you come across
an indirect jump using that pointer. Therefore, you should always exercise
care when using jump indirect instructions.
[1] Versions of Turbo Pascal, sadly, treat
the case
statement as a form of the if..then..else
statement.
[2] Note that state machines need not be software
based. Many state machines' implementation are hardware based.
[3] Actually, it remembers how many times, MOD 4, that
it has been called.
- 10.3 - CASE Statements
- 10.4 - State Machines and Indirect Jumps
- 10.5 - Spaghetti Code
Art of Assembly: Chapter Ten - 27 SEP 1996
[Chapter Ten][Previous] [Next]
[Art of Assembly][Randall
Hyde]